<!DOCTYPE html>
<html lang="en-us">
  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">
    
<meta charset="UTF-8">
<title>Theory Behind Relevance Scoring | Elasticsearch: The Definitive Guide [2.x] | Elastic</title>
<link rel="home" href="index.html" title="Elasticsearch: The Definitive Guide [2.x]">
<link rel="up" href="controlling-relevance.html" title="Controlling Relevance">
<link rel="prev" href="controlling-relevance.html" title="Controlling Relevance">
<link rel="next" href="practical-scoring-function.html" title="Lucene’s Practical Scoring Function">
<meta name="DC.type" content="Learn/Docs/Legacy/Elasticsearch/Definitive Guide/2.x">
<meta name="DC.subject" content="Elasticsearch">
<meta name="DC.identifier" content="2.x">
<meta name="robots" content="noindex,nofollow">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <script src="https://cdn.optimizely.com/js/18132920325.js"></script>
    <link rel="apple-touch-icon" sizes="57x57" href="/apple-icon-57x57.png">
    <link rel="apple-touch-icon" sizes="60x60" href="/apple-icon-60x60.png">
    <link rel="apple-touch-icon" sizes="72x72" href="/apple-icon-72x72.png">
    <link rel="apple-touch-icon" sizes="76x76" href="/apple-icon-76x76.png">
    <link rel="apple-touch-icon" sizes="114x114" href="/apple-icon-114x114.png">
    <link rel="apple-touch-icon" sizes="120x120" href="/apple-icon-120x120.png">
    <link rel="apple-touch-icon" sizes="144x144" href="/apple-icon-144x144.png">
    <link rel="apple-touch-icon" sizes="152x152" href="/apple-icon-152x152.png">
    <link rel="apple-touch-icon" sizes="180x180" href="/apple-icon-180x180.png">
    <link rel="icon" type="image/png" href="/favicon-32x32.png" sizes="32x32">
    <link rel="icon" type="image/png" href="/android-chrome-192x192.png" sizes="192x192">
    <link rel="icon" type="image/png" href="/favicon-96x96.png" sizes="96x96">
    <link rel="icon" type="image/png" href="/favicon-16x16.png" sizes="16x16">
    <link rel="manifest" href="/manifest.json">
    <meta name="apple-mobile-web-app-title" content="Elastic">
    <meta name="application-name" content="Elastic">
    <meta name="msapplication-TileColor" content="#ffffff">
    <meta name="msapplication-TileImage" content="/mstile-144x144.png">
    <meta name="theme-color" content="#ffffff">
    <meta name="naver-site-verification" content="936882c1853b701b3cef3721758d80535413dbfd">
    <meta name="yandex-verification" content="d8a47e95d0972434">
    <meta name="localized" content="true">
    <meta name="st:robots" content="follow,index">
    <meta property="og:image" content="https://www.elastic.co/static/images/elastic-logo-200.png">
    <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
    <link rel="icon" href="/favicon.ico" type="image/x-icon">
    <link rel="apple-touch-icon-precomposed" sizes="64x64" href="/favicon_64x64_16bit.png">
    <link rel="apple-touch-icon-precomposed" sizes="32x32" href="/favicon_32x32.png">
    <link rel="apple-touch-icon-precomposed" sizes="16x16" href="/favicon_16x16.png">
    <!-- Give IE8 a fighting chance -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
    <link rel="stylesheet" type="text/css" href="/guide/static/styles.css">
  </head>

  <!--© 2015-2021 Elasticsearch B.V. Copying, publishing and/or distributing without written permission is strictly prohibited.-->

  <body>
    <!-- Google Tag Manager -->
    <script>dataLayer = [];</script><noscript><iframe src="//www.googletagmanager.com/ns.html?id=GTM-58RLH5" height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>
    <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start': new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0], j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src= '//www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f); })(window,document,'script','dataLayer','GTM-58RLH5');</script>
    <!-- End Google Tag Manager -->

    <!-- Global site tag (gtag.js) - Google Analytics -->
    <script async src="https://www.googletagmanager.com/gtag/js?id=UA-12395217-16"></script>
    <script>
      window.dataLayer = window.dataLayer || [];
      function gtag(){dataLayer.push(arguments);}
      gtag('js', new Date());
      gtag('config', 'UA-12395217-16');
    </script>

    <!--BEGIN QUALTRICS WEBSITE FEEDBACK SNIPPET-->
    <script type="text/javascript">
      (function(){var g=function(e,h,f,g){
      this.get=function(a){for(var a=a+"=",c=document.cookie.split(";"),b=0,e=c.length;b<e;b++){for(var d=c[b];" "==d.charAt(0);)d=d.substring(1,d.length);if(0==d.indexOf(a))return d.substring(a.length,d.length)}return null};
      this.set=function(a,c){var b="",b=new Date;b.setTime(b.getTime()+6048E5);b="; expires="+b.toGMTString();document.cookie=a+"="+c+b+"; path=/; "};
      this.check=function(){var a=this.get(f);if(a)a=a.split(":");else if(100!=e)"v"==h&&(e=Math.random()>=e/100?0:100),a=[h,e,0],this.set(f,a.join(":"));else return!0;var c=a[1];if(100==c)return!0;switch(a[0]){case "v":return!1;case "r":return c=a[2]%Math.floor(100/c),a[2]++,this.set(f,a.join(":")),!c}return!0};
      this.go=function(){if(this.check()){var a=document.createElement("script");a.type="text/javascript";a.src=g;document.body&&document.body.appendChild(a)}};
      this.start=function(){var a=this;window.addEventListener?window.addEventListener("load",function(){a.go()},!1):window.attachEvent&&window.attachEvent("onload",function(){a.go()})}};
      try{(new g(100,"r","QSI_S_ZN_emkP0oSe9Qrn7kF","https://znemkp0ose9qrn7kf-elastic.siteintercept.qualtrics.com/WRSiteInterceptEngine/?Q_ZID=ZN_emkP0oSe9Qrn7kF")).start()}catch(i){}})();
    </script><div id="ZN_emkP0oSe9Qrn7kF"><!--DO NOT REMOVE-CONTENTS PLACED HERE--></div>
    <!--END WEBSITE FEEDBACK SNIPPET-->

    <div id="elastic-nav" style="display:none;"></div>
    <script src="https://www.elastic.co/elastic-nav.js"></script>

    <!-- Subnav -->
    <div>
      <div>
        <div class="tertiary-nav d-none d-md-block">
          <div class="container">
            <div class="p-t-b-15 d-flex justify-content-between nav-container">
              <div class="breadcrum-wrapper"><span><a href="/guide/" style="font-size: 14px; font-weight: 600; color: #000;">Docs</a></span></div>
            </div>
          </div>
        </div>
      </div>
    </div>

    <div class="main-container">
      <section id="content">
        <div class="content-wrapper">

          <section id="guide" lang="en">
            <div class="container">
              <div class="row">
                <div class="col-xs-12 col-sm-8 col-md-8 guide-section">
                  <!-- start body -->
                  <div class="page_header">
<p>
  <strong>WARNING</strong>: The 2.x versions of Elasticsearch have passed their
  <a href="https://www.elastic.co/support/eol">EOL dates</a>. If you are running
  a 2.x version, we strongly advise you to upgrade.
</p>
<p>
  This documentation is no longer maintained and may be removed. For the latest
  information, see the <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html">current
  Elasticsearch documentation</a>.
</p>
</div>
<div id="content">
<div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.html">Elasticsearch: The Definitive Guide [2.x]</a></span>
»
<span class="breadcrumb-link"><a href="search-in-depth.html">Search in Depth</a></span>
»
<span class="breadcrumb-link"><a href="controlling-relevance.html">Controlling Relevance</a></span>
»
<span class="breadcrumb-node">Theory Behind Relevance Scoring</span>
</div>
<div class="navheader">
<span class="prev">
<a href="controlling-relevance.html">« Controlling Relevance</a>
</span>
<span class="next">
<a href="practical-scoring-function.html">Lucene’s Practical Scoring Function »</a>
</span>
</div>
<div class="section">
<div class="titlepage"><div><div>
<h2 class="title">
<a id="scoring-theory"></a>Theory Behind Relevance Scoring<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch-definitive-guide/edit/2.x/170_Relevance/10_Scoring_theory.asciidoc">edit</a>
</h2>
</div></div></div>
<p>Lucene (and thus Elasticsearch) uses the
<a href="http://en.wikipedia.org/wiki/Standard_Boolean_model" class="ulink" target="_top"><em>Boolean model</em></a>
to find matching documents, and a formula called the
<a class="xref" href="practical-scoring-function.html" title="Lucene’s Practical Scoring Function"><em>practical scoring function</em></a>
to calculate relevance.  This formula borrows concepts from
<a href="http://en.wikipedia.org/wiki/Tfidf" class="ulink" target="_top"><em>term frequency/inverse document frequency</em></a> and the
<a href="http://en.wikipedia.org/wiki/Vector_space_model" class="ulink" target="_top"><em>vector space model</em></a>
but adds more-modern features like a coordination factor, field length
normalization, and term or query clause boosting.</p>
<div class="note admon">
<div class="icon"></div>
<div class="admon_content">
<p>Don’t be alarmed!  These concepts are not as complicated as the names make
them appear. While this section mentions algorithms, formulae, and mathematical
models, it is intended for consumption by mere humans.  Understanding the
algorithms themselves is not as important as understanding the factors that
influence the outcome.</p>
</div>
</div>
<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="boolean-model"></a>Boolean Model<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch-definitive-guide/edit/2.x/170_Relevance/10_Scoring_theory.asciidoc">edit</a>
</h3>
</div></div></div>
<p>The <em>Boolean model</em> simply applies the <code class="literal">AND</code>, <code class="literal">OR</code>, and <code class="literal">NOT</code> conditions
expressed in the query to find all the documents that match. A query for</p>
<pre class="literallayout">full AND text AND search AND (elasticsearch OR lucene)</pre>

<p>will include only documents that contain all of the terms <code class="literal">full</code>, <code class="literal">text</code>, and
<code class="literal">search</code>, and either <code class="literal">elasticsearch</code> or <code class="literal">lucene</code>.</p>
<p>This process is simple and fast.  It is used to exclude any documents that
cannot possibly match the query.</p>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="tfidf"></a>Term Frequency/Inverse Document Frequency (TF/IDF)<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch-definitive-guide/edit/2.x/170_Relevance/10_Scoring_theory.asciidoc">edit</a>
</h3>
</div></div></div>
<p>Once we have a list of matching documents, they need to be ranked by
relevance. Not all documents will contain all the terms, and some terms are
more important than others. The relevance score of the whole document
depends (in part) on the <em>weight</em> of each query term that appears in
that document.</p>
<p>The weight of a term is determined by three factors, which we already
introduced in <a class="xref" href="relevance-intro.html" title="What Is Relevance?">What Is Relevance?</a>. The formulae are included for interest’s
sake, but you are not required to remember them.</p>
<div class="section">
<div class="titlepage"><div><div>
<h4 class="title">
<a id="tf"></a>Term frequency<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch-definitive-guide/edit/2.x/170_Relevance/10_Scoring_theory.asciidoc">edit</a>
</h4>
</div></div></div>
<p>How often does the term appear in this document? The more often, the
<em>higher</em> the weight.  A field containing five mentions of the same term is
more likely to be relevant than a field containing just one mention.
The term frequency is calculated as follows:</p>
<pre class="literallayout">tf(t in d) = √frequency <a id="CO101-1"></a><i class="conum" data-value="1"></i></pre>

<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO101-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>The term frequency (<code class="literal">tf</code>) for term <code class="literal">t</code> in document <code class="literal">d</code> is the square root
of the number of times the term appears in the document.</p>
</td>
</tr>
</table>
</div>
<p>If you don’t care about how often a term appears in a field, and all you care
about is that the term is present, then you can disable term frequencies in
the field mapping:</p>
<div class="pre_wrapper lang-json">
<pre class="programlisting prettyprint lang-json">PUT /my_index
{
  "mappings": {
    "doc": {
      "properties": {
        "text": {
          "type":          "string",
          "index_options": "docs" <a id="CO102-1"></a><i class="conum" data-value="1"></i>
        }
      }
    }
  }
}</pre>
</div>
<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO102-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>Setting <code class="literal">index_options</code> to <code class="literal">docs</code> will disable term frequencies and term
positions. A field with this mapping will not count how many times a term
appears, and will not be usable for phrase or proximity queries.
Exact-value <code class="literal">not_analyzed</code> string fields use this setting by default.</p>
</td>
</tr>
</table>
</div>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h4 class="title">
<a id="idf"></a>Inverse document frequency<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch-definitive-guide/edit/2.x/170_Relevance/10_Scoring_theory.asciidoc">edit</a>
</h4>
</div></div></div>
<p>How often does the term appear in all documents in the collection?  The more
often, the <em>lower</em> the weight. Common terms like <code class="literal">and</code> or <code class="literal">the</code> contribute
little to relevance, as they appear in most documents, while uncommon terms
like <code class="literal">elastic</code> or <code class="literal">hippopotamus</code> help us zoom in on the most interesting
documents. The inverse document frequency is calculated as follows:</p>
<pre class="literallayout">idf(t) = 1 + log ( numDocs / (docFreq + 1)) <a id="CO103-1"></a><i class="conum" data-value="1"></i></pre>

<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO103-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>The inverse document frequency (<code class="literal">idf</code>) of term <code class="literal">t</code> is the
logarithm of the number of documents in the index, divided by
the number of documents that contain the term.</p>
</td>
</tr>
</table>
</div>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h4 class="title">
<a id="field-norm"></a>Field-length norm<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch-definitive-guide/edit/2.x/170_Relevance/10_Scoring_theory.asciidoc">edit</a>
</h4>
</div></div></div>
<p>How long is the field?  The shorter the field, the <em>higher</em> the weight. If a
term appears in a short field, such as a <code class="literal">title</code> field, it is more likely that
the content of that field is <em>about</em> the term than if the same term appears
in a much bigger <code class="literal">body</code> field. The field length norm is calculated as follows:</p>
<pre class="literallayout">norm(d) = 1 / √numTerms <a id="CO104-1"></a><i class="conum" data-value="1"></i></pre>

<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO104-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>The field-length norm (<code class="literal">norm</code>) is the inverse square root of the number of terms
in the field.</p>
</td>
</tr>
</table>
</div>
<p>While the field-length norm is important for full-text search, many other
fields don’t need norms. Norms consume approximately 1 byte per <code class="literal">string</code> field
per document in the index, whether or not a document contains the field.  Exact-value <code class="literal">not_analyzed</code> string fields have norms disabled by default,
but you can use the field mapping to disable norms on <code class="literal">analyzed</code> fields as
well:</p>
<div class="pre_wrapper lang-json">
<pre class="programlisting prettyprint lang-json">PUT /my_index
{
  "mappings": {
    "doc": {
      "properties": {
        "text": {
          "type": "string",
          "norms": { "enabled": false } <a id="CO105-1"></a><i class="conum" data-value="1"></i>
        }
      }
    }
  }
}</pre>
</div>
<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO105-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>This field will not take the field-length norm into account.  A long field
and a short field will be scored as if they were the same length.</p>
</td>
</tr>
</table>
</div>
<p>For use cases such as logging, norms are not useful.  All you care about is
whether a field contains a particular error code or a particular browser
identifier. The length of the field does not affect the outcome.  Disabling
norms can save a significant amount of memory.</p>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h4 class="title">
<a id="_putting_it_together"></a>Putting it together<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch-definitive-guide/edit/2.x/170_Relevance/10_Scoring_theory.asciidoc">edit</a>
</h4>
</div></div></div>
<p>These three factors—​term frequency, inverse document frequency, and field-length norm—​are calculated and stored at index time.  Together, they are
used to calculate the <em>weight</em> of a single term in a particular document.</p>
<div class="tip admon">
<div class="icon"></div>
<div class="admon_content">
<p>When we refer to <em>documents</em> in the preceding formulae, we are actually talking about
a field within a document.  Each field has its own inverted index and thus,
for TF/IDF purposes, the value of the field is the value of the document.</p>
</div>
</div>
<p>When we run a simple <code class="literal">term</code> query with <code class="literal">explain</code> set to <code class="literal">true</code> (see
<a class="xref" href="relevance-intro.html#explain" title="Understanding the Score">Understanding the Score</a>), you will see that the only factors involved in calculating the
score are the ones explained in the preceding sections:</p>
<div class="pre_wrapper lang-json pagebreak-before">
<pre class="programlisting prettyprint lang-json pagebreak-before">PUT /my_index/doc/1
{ "text" : "quick brown fox" }

GET /my_index/doc/_search?explain
{
  "query": {
    "term": {
      "text": "fox"
    }
  }
}</pre>
</div>
<p>The (abbreviated) <code class="literal">explanation</code> from the preceding request is as follows:</p>
<pre class="literallayout">weight(text:fox in 0) [PerFieldSimilarity]:  0.15342641 <a id="CO106-1"></a><i class="conum" data-value="1"></i>
result of:
    fieldWeight in 0                         0.15342641
    product of:
        tf(freq=1.0), with freq of 1:        1.0 <a id="CO106-2"></a><i class="conum" data-value="2"></i>
        idf(docFreq=1, maxDocs=1):           0.30685282 <a id="CO106-3"></a><i class="conum" data-value="3"></i>
        fieldNorm(doc=0):                    0.5 <a id="CO106-4"></a><i class="conum" data-value="4"></i></pre>

<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO106-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>The final <code class="literal">score</code> for term <code class="literal">fox</code> in field <code class="literal">text</code> in the document with internal
Lucene doc ID <code class="literal">0</code>.</p>
</td>
</tr>
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO106-2"><i class="conum" data-value="2"></i></a></p>
</td>
<td align="left" valign="top">
<p>The term <code class="literal">fox</code> appears once in the <code class="literal">text</code> field in this document.</p>
</td>
</tr>
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO106-3"><i class="conum" data-value="3"></i></a></p>
</td>
<td align="left" valign="top">
<p>The inverse document frequency of <code class="literal">fox</code> in the <code class="literal">text</code> field in all
documents in this index.</p>
</td>
</tr>
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO106-4"><i class="conum" data-value="4"></i></a></p>
</td>
<td align="left" valign="top">
<p>The field-length normalization factor for this field.</p>
</td>
</tr>
</table>
</div>
<p>Of course, queries usually consist of more than one term, so we need a
way of combining the weights of multiple terms.  For this, we turn to the
vector space model.</p>
</div>

</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="vector-space-model"></a>Vector Space Model<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch-definitive-guide/edit/2.x/170_Relevance/10_Scoring_theory.asciidoc">edit</a>
</h3>
</div></div></div>
<p>The <em>vector space model</em> provides a way of comparing a multiterm query
against a document. The output is a single score that represents how well the
document matches the query.  In order to do this, the model represents both the document
and the query as <em>vectors</em>.</p>
<p>A vector is really just a one-dimensional array containing numbers, for example:</p>
<pre class="literallayout">[1,2,5,22,3,8]</pre>

<p>In the vector space model, each number in the vector is the <em>weight</em> of a term,
as calculated with <a class="xref" href="scoring-theory.html#tfidf" title="Term Frequency/Inverse Document Frequency (TF/IDF)">term frequency/inverse document frequency</a>.</p>
<div class="tip admon">
<div class="icon"></div>
<div class="admon_content">
<p>While TF/IDF is the default way of calculating term weights for the vector
space model, it is not the only way.  Other models like Okapi-BM25 exist and
are available in Elasticsearch.  TF/IDF is the default because it is a
simple, efficient algorithm that produces high-quality search results and
has stood the test of time.</p>
</div>
</div>
<p>Imagine that we have a query for “happy hippopotamus.”  A common word like
<code class="literal">happy</code> will have a low weight, while an uncommon term like <code class="literal">hippopotamus</code>
will have a high weight. Let’s assume that <code class="literal">happy</code> has a weight of 2 and
<code class="literal">hippopotamus</code> has a weight of 5.  We can plot this simple two-dimensional
vector—<code class="literal">[2,5]</code>—as a line on a graph starting at point (0,0) and
ending at point (2,5), as shown in <a class="xref" href="scoring-theory.html#img-vector-query" title="A two-dimensional query vector for “happy hippopotamus” represented">Figure 27, “A two-dimensional query vector for “happy hippopotamus” represented”</a>.</p>
<div id="img-vector-query" class="imageblock">
<div class="content">
<img src="images/elas_17in01.png" alt="The query vector plotted on a graph">
</div>
<div class="title">Figure 27. A two-dimensional query vector for “happy hippopotamus” represented</div>
</div>
<p>Now, imagine we have three documents:</p>
<div class="olist orderedlist">
<ol class="orderedlist">
<li class="listitem">
I am <em>happy</em> in summer.
</li>
<li class="listitem">
After Christmas I’m a <em>hippopotamus</em>.
</li>
<li class="listitem">
The <em>happy hippopotamus</em> helped Harry.
</li>
</ol>
</div>
<p>We can create a similar vector for each document, consisting of the weight of
each query term—<code class="literal">happy</code> and <code class="literal">hippopotamus</code>—that appears in the
document, and plot these vectors on the same graph, as shown in <a class="xref" href="scoring-theory.html#img-vector-docs" title="Query and document vectors for “happy hippopotamus”">Figure 28, “Query and document vectors for “happy hippopotamus””</a>:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
Document 1: <code class="literal">(happy,____________)</code>—<code class="literal">[2,0]</code>
</li>
<li class="listitem">
Document 2: <code class="literal">( ___ ,hippopotamus)</code>—<code class="literal">[0,5]</code>
</li>
<li class="listitem">
Document 3: <code class="literal">(happy,hippopotamus)</code>—<code class="literal">[2,5]</code>
</li>
</ul>
</div>
<div id="img-vector-docs" class="imageblock">
<div class="content">
<img src="images/elas_17in02.png" alt="The query and document vectors plotted on a graph">
</div>
<div class="title">Figure 28. Query and document vectors for “happy hippopotamus”</div>
</div>
<p>The nice thing about vectors is that they can be compared. By measuring the
angle between the query vector and the document vector, it is possible to
assign a relevance score to each document. The angle between document 1 and
the query is large, so it is of low relevance.  Document 2 is closer to the
query, meaning that it is reasonably relevant, and document 3 is a perfect
match.</p>
<div class="tip admon">
<div class="icon"></div>
<div class="admon_content">
<p>In practice, only two-dimensional vectors (queries with two terms) can  be
plotted easily on a graph. Fortunately, <em>linear algebra</em>—the branch of
mathematics that deals with vectors—​provides tools to compare the
angle between multidimensional vectors, which means that we can apply the
same principles explained above to queries that consist of many terms.</p>
<p>You can read more about how to compare two vectors by using <a href="http://en.wikipedia.org/wiki/Cosine_similarity" class="ulink" target="_top"><em>cosine similarity</em></a>.</p>
</div>
</div>
<p>Now that we have talked about the theoretical basis of scoring, we can move on
to see how scoring is implemented in Lucene.</p>
</div>

</div>
<div class="navfooter">
<span class="prev">
<a href="controlling-relevance.html">« Controlling Relevance</a>
</span>
<span class="next">
<a href="practical-scoring-function.html">Lucene’s Practical Scoring Function »</a>
</span>
</div>
</div>

                  <!-- end body -->
                </div>
                <div class="col-xs-12 col-sm-4 col-md-4" id="right_col">
                  <div id="rtpcontainer" style="display: block;">
                    <div class="mktg-promo">
                      <h3>Most Popular</h3>
                      <ul class="icons">
                        <li class="icon-elasticsearch-white"><a href="https://www.elastic.co/webinars/getting-started-elasticsearch?baymax=default&amp;elektra=docs&amp;storm=top-video">Get Started with Elasticsearch: Video</a></li>
                        <li class="icon-kibana-white"><a href="https://www.elastic.co/webinars/getting-started-kibana?baymax=default&amp;elektra=docs&amp;storm=top-video">Intro to Kibana: Video</a></li>
                        <li class="icon-logstash-white"><a href="https://www.elastic.co/webinars/introduction-elk-stack?baymax=default&amp;elektra=docs&amp;storm=top-video">ELK for Logs &amp; Metrics: Video</a></li>
                      </ul>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </section>

        </div>


<div id="elastic-footer"></div>
<script src="https://www.elastic.co/elastic-footer.js"></script>
<!-- Footer Section end-->

      </section>
    </div>

<script src="/guide/static/jquery.js"></script>
<script type="text/javascript" src="/guide/static/docs.js"></script>
<script type="text/javascript">
  window.initial_state = {}</script>
  </body>
</html>
